4 research outputs found

    Metric Selection and Metric Learning for Matching Tasks

    Get PDF
    A quarter of a century after the world-wide web was born, we have grown accustomed to having easy access to a wealth of data sets and open-source software. The value of these resources is restricted if they are not properly integrated and maintained. A lot of this work boils down to matching; finding existing records about entities and enriching them with information from a new data source. In the realm of code this means integrating new code snippets into a code base while avoiding duplication. In this thesis, we address two different such matching problems. First, we leverage the diverse and mature set of string similarity measures in an iterative semisupervised learning approach to string matching. It is designed to query a user to make a sequence of decisions on specific cases of string matching. We show that we can find almost optimal solutions after only a small amount of such input. The low labelling complexity of our algorithm is due to addressing the cold start problem that is inherent to Active Learning; by ranking queries by variance before the arrival of enough supervision information, and by a self-regulating mechanism that counteracts initial biases. Second, we address the matching of code fragments for deduplication. Programming code is not only a tool, but also a resource that itself demands maintenance. Code duplication is a frequent problem arising especially from modern development practice. There are many reasons to detect and address code duplicates, for example to keep a clean and maintainable codebase. In such more complex data structures, string similarity measures are inadequate. In their stead, we study a modern supervised Metric Learning approach to model code similarity with Neural Networks. We find that in such a model representing the elementary tokens with a pretrained word embedding is the most important ingredient. Our results show both qualitatively (by visualization) that relatedness is modelled well by the embeddings and quantitatively (by ablation) that the encoded information is useful for the downstream matching task. As a non-technical contribution, we unify the common challenges arising in supervised learning approaches to Record Matching, Code Clone Detection and generic Metric Learning tasks. We give a novel account to string similarity measures from a psychological standpoint and point out and document one longstanding naming conflict in string similarity measures. Finally, we point out the overlap of latest research in Code Clone Detection with the field of Natural Language Processing

    [Pi 0 1] Classes Boundedness and Degrees

    No full text
    In this thesis we consider Π 0 1 class, which can roughly be defined as the sets of infinite paths through computable trees. Historically, Π 0 1 classes have first been considered by Shoenfield in an investigation of the complexity of complete extensions of computably axiomatizable first-order theories, such as Peano arithmetic. We aim for a comprehensive access to the notion of Π 0 1 classes. First of all we motivate the notion of tree used in the context of Π 0 1 classes, which is a special case of the common graph-theoretic notion. We find, that while formally a special case, its definition is reasonable, since it captures all considered structural phenomena. Also, we do not only consider Π 0 1 classes in 2 ω , but in all of ω ω . However, we restrict ourselves to Π 0 1 classes that are bounded in some sense. For this purpose, we investigate different notions of boundedness before identifying two somewhat universal classes of bounded Π 0 1 classes, then simply called bounded and computably bounded Π 0 1 classes, and a nice way of representing them. After that, we turn to the core of the thesis. That is, we investigate what spectra of degrees the members of a Π 0 1 class of a given kind can bear. On the one hand, we try to find members of particularly low complexity in some sense. In order to do this, we establish basis theorems that tell us, that any Π 0 1 class of a class C has a member of a class of functions B , called basis. On the other hand, we showcase some Π 0 1 classes, which by example show that any according basis, in return, has to contain a member of some property. Returning to the initial motivation for our consideration, we apply the results to logical theories, and Peano arithmetic in particular, thereby nourishing the findings of Gödel’s Incompleteness Theorem. An important result is that the class of complete extensions of Peano arithmetic form a universal computably bounded Π 0 1 class. That means that every such extension computes some member of every computably bounded Π 0 1 class. The degrees of theses extensions, called PA degrees, can even be characterized by this property. We assemble other characterizations and results on PA degrees. A notable feature of this thesis is the generalization of Shoenfield’s construction for the improvement of the Kreisel Basis Theorem. This generalization seems not to have been done yet. The originally resulting Kreisel-Shoenfield Basis Theorem is properly weaker than the Low Basis Theorem of Jockusch and Soare, but the generalization has more ramifications than just that theorem. For one, it implies that one can delete any maximal elements of a wide range of bases for any of the considered classes of Π 0 1 classes. This then shows that there are chains of low degrees of arbitrary finite length. The same is implied for hyperimmune-free degrees. As another application, the generalization of Shoenfield’s construction sharpens Solovay’s characterization of the PA degrees. By this tightened characterization, it provides an immediate alternative proof of Scott and Tennenbaum’s result that there is no minimal degree that is PA. And in fact, it implies the apparently novel result that there is an infinite chain of PA degrees below every PA degree

    Sports economics and sports management between business science and sport science?

    No full text

    Nickel, palladium and platinum, survey covering the years 1984 and 1985

    No full text
    corecore